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§1  Introduction 

In  System/U  (sec  [KU]),  the  database  is  regarded  as  a  set  of  objects  and  a  number 
of  collections  of  objects,  called  maximal  objects  (sec  [MU1]).  These  maximal  objects  can 
be  regarded  as  hypergrapks ,  where  the  nodes  are  the  attributes  of  the  database  and  the 
edges  arc  the  objects.  Each  maximal  object  is  assumed  to  be  an  acyclic  hypergraph.  The 
database  designer  specifies  the  objects  and  maximal  objects,  and  a  tree  representation  for  each 
hypergraph  is  then  computed.  To  process  a  query  on  the  database,  the  query  interpreter  first 
converts  it  into  some  combination  of  queries,  each  of  involves  only  one  maximal  object.  Each 
such  query  is  a  projection  applied  to  the  join  of  all  objects  in  the  maximal  object,  and  can 
now  be  optimized  using  tableaux  optimization  (see  [ASU]).  [ASU]  also  present  an  optimization 
algorithm  for  simple  tableaux  which  includes  tableaux  derived  from  hypergraphs.  As  noted  in 
[MU2],  optimization  of  tableaux  derived  from  hypergraphs  is  equivalent  to  Graham  reduction  of 
the  corresponding  hypergraph  with  a  given  set  of  sacred  nodes.  The  following  is  an  algorithm  to 
compute  such  a  reduction,  which  is  more  efficient  than  applying  the  [ASU]  algorithm  mentioned 
above.  The  algorithm  requires  that  we  already  have  a  tree  representing  the  hypergraph.  Such 
a  representation  can  be  constructed  by  carrying  out  an  ordinary  Graham  reduction  on  the 
hypergraph  (see  [BFMY]).  The  tree  representation  is  constructed  once  and  for  all  when  defining 
the  database,  and  it  is  then  used  to  guide  the  algorithm  performing  the  Graham  reduction  with 
sacred  nodes. 


§2  Definitions 

A  hypergraph  G  is  a  set  of  nodes  Nt  and  a  set  of  edges  E ,  where  edges  are  sets  of 
nodes.  We  will  be  interested  only  in  acyclic  hypergraphs.  These  are  defined  in,  e.g.,  [BFMY]. 
We  shall  not  repeat  the  definition  here,  as  it  is  never  used  here.  We  shall  use  instead  two 
equivalent  properties  described  below. 

Definition:  Let  G  be  an  acyclic  hypergraph,  and  X  a  set  of  nodes  in  G.  The  Graham  reduction 
of  G  with  sacred  nodes  Xf  denoted  GR(G9X)  (see  [MU2]),  is  obtained  by  carrying  out  the 
following  steps,  in  any  order: 

(1)  If  A  is  an  isolated  node  (i.e.,  is  in  only  one  edge)  and  is  nonsacred  (i.e.,  is  not  in  X), 
then  delete  A . 

(2)  If  R ,  S  are  two  edges  in  the  hypergraph,  such  that  R  C  5,  then  delete  R . 

GR[GjX)  consists  of  those  nodes  and  edges  that  remain  when  no  .further  reductions  can  be 
made. 

Definition:  Let  G  be  an  acyclic  hypergraph.  A  tree  T  whose  nodes  are  the  edges  of  G  represents 
G ,  iff  for  all  edges  Ri  and  77,  in  G  and  all  nodes  A  of  G,  if  Rk  is  on  the  path  in  T  connecting 
Ri  and  77,  and  A  is  in  both  77,  and  77,,  then  A  is  in  77*. 

Lemma  1.  The  following  three  propertiee  are  equivalent: 

(1)  G  it  aeyeHe. 
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Figure  1  Deletion  of  5  in  step  1  of  the  algorithm. 


(2)  Performing  the  Graham  reduction  on  G  with  an  empty  set  of  sacred  nodes  results  in 
the  empty  set 

(3)  There  exists  a  tree  T  representing  the  hyper  graph  G.  | 

A  proof  can  be  found  in  [BFMY]. 

Definition:  COVERS[SyR)  where  S  and  R  are  edges  in  a  hypergraph,  means  that  R  covers 
S ,  i.e.j 

(1)  Every  isolated  node  in  S  is  nonsacred. 

(2)  If  A  is  a  nonisolated  node  in  S,  then  A  is  in  /?. 

If  COVERS(SfR)f  then  S  can  be  deleted  in  a  Graham  reduction  by  first  deleting  all  isolated 
nodes  in  5,  and  then  deleting  S  using  rule  (2). 


§3  An  algorithm  to  compute  GR(G,X). 

Description  of  the  Algorithm. 

We  arc  given  a  hypergraph  G,  a  set  of  sacred  nodes  X ,  and  a  tree  T  representing  the  hypergraph 

G.  Apply  the  following  two  steps  to  T : 

1:  Scan  the  tree  from  the  bottom  up.  For  each  node  Rf  examine  its  children  from  left 

to  right.  Let  the  current  child  be  5,  and  let  its  children  be  Yjt  j  =  1, ...,m  (if  it 
has  any).  Also  let  R's  children  to  the  left  of  S  be  Xi ,  %  =  1,...,/,  and  those  to  5’s 
right  be  Zkt  k  =  l,...,n.  If  CO VERS(Sf  R)f  carry  out  the  transformation  in  Fig.  1, 
and  continue  comparing  R  with  its  children  from  the  node  marked  the  leftmost 
child  of  5.  (If  S  has  no  children  then  continue  from  Z\.)  Node  R  is  processed  either 
when  we  compare  R  with  its  rightmost  child  and  cannot  delete  the  child,  or  when  the 
rightmost  child  is  a  leaf,  and  we  delete  it. 
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Figure  t  Deletion  of  R  in  step  2  of  the  algorithm. 


2:  Scan  the  tree  from  the  top  down.  For  each  node  /?,  examine  its  children  from  left  to 

right.  Let  the  current  child  be  5,  and  let  Jft,  Yy,  Zk  be  as  in  step  1.  If  COVERS(R9  S)9 
carry  out  the  transformation  in  Fig.  2 ,  and  continue  from  the  node  marked  the 
leftmost  child  of  S,  with  R  replaced  by  S.  (We  do  not  compare  5  with  the  Z*'s.) 
A  node  is  processed  either  when  we  compare  it  with  its  rightmost  child  and  cannot 
delete  it,  or  if  it  is  a  leaf  and  wc  delete  its  parent. 

Note  that  in  Step.  2  we  do  not  have  to  test  if  COVERS(St  Zk)  for  k  =  l,...,r,  since  the 
proof  will  show  that  this  can  never  happen.  Also  note  that  the  comparison  of  i  node  with 
it’s  children  is  distinct  from  the  top-down  processing  of  it’s  children,  so  that  the  Zk  »  are  all 
compared  with  their  children. 

We  now  prove  a  basic  lemma  required  for  the  main  theorem: 

Lemma  2.  Let  R,  S,  T  be  edges  in  a  hypergraph,  where  ^COVERS[RyS)  and  T  ft  R,  S. 
If  after  deleting  T,  COVERS[R}  S)  holds,  then:  (see  Fig.  S.  ) 

(1)  There  is  a  node  A  in  the  hypergraph,  such  that  A  is  in  R,  T  and  m  no  other  edge. 

(t)  In  any  tree  representing  the  hypergraph,  one  of  R  and  T  is  the  parent  of  the  other. 

Proof  :  Since  -*COVERS[R,  S )  before  deleting  T ,  there  must  be  a  node  A  in  R  satisfying  oi* 
of  the  following: 

(1)  A  is  isolated  and  sacred. 

(2)  A  is  nonisolated  and  not  in  S . 

The  first  possibility  cannot  hold,  since  if  it  did  A  would  remain  isolated  and  sacred  after  deleting 
Tf  and  therefore  we  would  have  -i COVERS(RtS )  after  deleting  T. 

Therefore  the  second  possibility  must  be  true,  which  implies  that  after  deleting  T  we 
will  still  have  A  in  R  and  not  in  5,  and  therefore  the  only  way  we  can  have  CO VERS(Rf  S) 
is  for  A  to  become  isolated  upon  deletion  of  T.  Therefore,  the  first  part  of  the  Lemma  must 
hold.  To  show  that  R  and  T  are  adjacent  in  the  tree,  assume  they  are  not.  Let  U  be  any 
(hypergraph)  edge  on  the  path  in  the  tree  connecting  them.  Then  A  must  be  in  U  (by  the 
definition  of  a  tree  representing  a  hypergraph),  contradicting  the  first  part  of  the  Lemma.  | 
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Figure  8  CO  VERS(RfS)  after  deleting  T. 


The  main  theorem  to  be  proved  is: 

Theorem  3.  After  applying  the  above  algorithm,  we  obtain  a  tree  representing  a  hypergraph 
that  can  be  obtained  from  G  by  applying  steps  in  a  Graham  reduction  with  sacred  nodes  X . 
This  hypergraph  has  the  property  that  if  U  and  V  are  two  of  its  edges,  neither  of  them  covers 
the  other .  Therefore,  if  we  delete  all  the  isolated  nonsacred  nodes  from  this  hypergraph,  we  get 
GR{GfX). 

In  order  to  prove  the  theorem  we  will  first  show  two  lemmas  about  the  state  of  the  tree  after 
each  step  of  the  Algorithm.  In  their  proofs  we  use  COVERS 1  for  the  COVERS  relation  before 
deleting  an  edge,  and  COVER 5®  afterwards. 

Lemma  4.  After  Step  1  of  the  above  algorithm,  we  obtain  a  tree  T^1)  with  the  following 
properties 

(1)  The  tree  represents  a  hypergraph  that  can  be  obtained  from  T  by  a  number  of  steps  of 
a  Graham  reduction  with  sacred  nodes  X . 

(t)  Ifl)  andV  are  nodes  in  T™  and  U  is  a  child  ofV,  then  nCOVBRS(UtV). 

Proof :  The  proof  is  by  induction  on  the  nodes  that  have  been  processed.  At  each  stage,  assume 
that  the  tree  satisfies  (1).  Also  assume  that  (2)  holds  for  all  V  that  have  been  processed,  and 
for  V  =  R  and  U  to  the  left  of  5,  where  R  and  S  are  as  in  Fig.  1.  We  show  that  (1)  and  (2) 
still  hold  after  deleting  5,  with  the  new  relation  COVERS? ,  for  gjl  V  that  have  been  processed 
and  for  V  =  R  and  U  to  the  left  of  4*\ 

In  Fig.  1,  R  is  the  node  wc  arc  interested  in  at  this  stage  of  the  induction,  and  S  the 
child  we  have  compared  with  R  and  found  that  CO  VERSUS,  /?). 
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(1)  Since  a  node  S  in  the  tree  is  deleted  only  if  there  is  some  R  such  that  COVERS1  (S,  R ), 
every  deletion  is  a  step  in  the  Graham  reduction.  To  show  that  the  new  tree  represents 
its  hypergraph,  let  U\,  U%  be  nodes  in  the  new  tree,  AinUid  and  V  on  a  path 
connecting  U\  and  In  most  cases,  this  is  also  a  path  in  the  previous  tree,  and 
therefore  A  is  also  in  V .  The  only  case  when  this  is  not  true  is  when  the  path  includes 
one  or  two  of  the  Yj' s.  If  it  only  contains  one  of  them,  it  can  be  extended  to  a  path  in 
the  original  tree  by  adding  S  to  the  path,  and  therefore  A  is  again  in  V,  If  it  connects 
two  of  the  Yy’s,  replace  R  in  the  path  by  5.  U  V  ^  R ,  then  we  immediately  see  that 
A  is  in  V.  Otherwise  V  =  R,  in  which  case  A  in  5  and  nonisolated  (since  A  is  in 
both  U\  and  U^)  and  COVERSl{S,R )  together  imply  that  A  is  in  R . 

(2)  This  will  not  hold  after  deleting  S  only  in  the  following  two  cases: 

(a)  COVERSx(Uy  V),  but  COVERS2(Uf  V),  where  U  is  a  child  of  V. 

In  this  case,  Lemma  2  shows  that  U  and  S  are  adjacent.  Since  V  is  in  the  new 
tree,  V  ^  Sf  and  since  V  is  a  child  of  V  the  only  possibility  is  U  =  7?.  This 
implies  that  V  is  R's  father,  and  therefore  V  has  not  yet  been  processed. 

(b)  COVERS? (U,  V),  where  U  becomes  a  child  of  V  as  a  result  of  deleting  5. 
This  can  only  occur  when  U  =  for  some  jt  and  V  =  R.  This  is  also  a  pair 
that  has  not  yet  been  processed.  | 

Lemma  5.  After  applying  Step  2  to  we  obtain  a  tree  T W  with  the  following  properties: 

(1)  This  tree  represents  a  hypergraph  that  can  be  obtained  from  T  by  a  number  of  steps 

of  a  Graham  reduction  with  sacred  nodes  X. 

(2)  IfU,  V  are  nodes  in  the  tree  with  U  a  child  ofV,  then  - COVERS[U,V ). 

(3)  IfU,  V  are  nodes  th  the  tree  with  U  a  child  ofV,  then  COVERS{VfU ). 

Proof :  As  in  the  previous  lemma,  assume  the  result  holds  where  V  is  either  a  node  that  has 
been  already  processed,  or  where  V  is  the  current  node  Rf  and  U  is  to  the  left  of  5.  We  show 
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Figure  5  -^COVERS[Xit  S)  after  deleting  R. 


that  the  results  hold  after  deleting  R  for  the  relation  COVERS? ,  for  V  that  has  been  processed 
and  for  V  =  S  and  U  to  the  left  of  **\ 

In  the  following  we  shall  use  F  to  stand  for  R's  parent,  if  R  has  one  (see  Fig.  4). 

(1)  Obviously  each  deletion  is  a  step  in  a  Graham  reduction.  To  show  that  the  new  tree 
represents  its  hypergraph,  let  U\9  U2  be  nodes  in  the  new  tree,  A  E  U\  H  U<i9  and  V 
on  a  path  connecting  U\  and  t/2-  If  the  path  does  not  go  through  S,  then  the  path 
was  also  a  path  in  the  tree  before  deleting  /?,  and  therefore  A  6  V.  The  path  is  also 
a  path  in  the  previous  tree  if  it  connects  two  Yj* s,  and  can  be  extended  to  one  if  it 
includes  only  one  of  the  Xt’s  and  Z*’s.  The  remaining  case  is  when  the  path  connects 
two  of  the  Xi  s  and  Zk s.  If  we  replace  S  by  R  we  get  a  path  in  the  original  tree. 
Therefore  if  V  ^  S  we  immediately  see  that  A  is  in  V .  If  V  =  S  then  A  in  R  and 
nonisolated,  and  COVERSx(RyS)  together  imply  that  A  is  in  S. 

(2)  Let  U  be  a  child  of  Vt  and  COVERS2 (U ,  V ).  As  in  the  previous  lemma  there  are  two 
possibilities: 

(a)  U  and  V  were  not  adjacent  before  deleting  R .  Then  one  of  the  following  must 
hold: 

(i)  U  is  one  of  the  Xi  s  or  the  Zk  s  and  V  =  S.  Both  these  cases  are 
proved  in  the  Same  way,  so  let  us  take  U  =  Xi .  From  step  1  we  know 
that  -1  COVERS1  (Xi,R).  Therefore  there  is  a  node  A  in  Xi  such  that 
either  A  is  isolated  and  sacred,  in  which  case  A  will  remain  isolated 
and  so  1 COVERS?(Xi,S)9  or  A  is  nonisolated  and  not  in  R  (sec  Fig. 
5).  In  that  case,  since  R  is  on  a  path  in  the  tree  connecting  Xi  and 
Sy  A  cannot  be  in  S.  Since  A  is  not  in  Ry  A  remains  nonisolated,  and 
therefore  ^COVER&{XitS). 

(ii)  U  =  S,V  =  F.  From  step  1,  -<COVERS1(S,  R).  If  this  is  due  to  S  con¬ 
taining  an  isolated  sacred  node,  we  immediately  get  -<COVERSP(S,F). 
Otherwise  there  is  a  nonisolated  A  in  S  which  is  not  in  R .  Then  A  will 
remain  nonisolated  after  deleting  Rt  and  since  R  is  on  a  path  connecting 
S  and  F ,  A  cannot  be  in  F .  Therefore,  ^COVERS?(S}F)  . 


3.  An  algorithm  to  compute  GR(G,X). 


8 


(b)  U  is  a  child  of  V  (before  deleting  /?),  and  -^COVERS1  (U  ,V).  By  Lemma  1, 
U  must  then  be  adjacent  to  Rt  and  therefore  U  =  (ijX,  (ii)  Zk  (iii)  S  (iv)  F . 
The  first  three  imply  that  V  =  R  which  is  impossible,  since  R  has  just  been 
deleted.  In  the  fourth  case,  U  =  F  and  V  is  R' s  grandparent  (call  it  G).  We 
have  -» COVERS* (F yG)y  from  step  1.  This  is  due  either  to  F  containing  an 
isolated  sacred  node,  in  which  case  we  immediately  have  -^COVERS/2(Fi  G),  or 
to  F  containing  a  nonisolated  node  A  which  is  not  in  G.  If  COVERS^ (Ff  G), 
then  A  must  become  isolated,  and  therefore  must  be  in  R  and  F  only.  But 
then,  COVERS1  (R,  S)  implies  that  A  is  in  S,  a  contradiction. 

(3)  Let  U  be  a  child  of  V  such  that  COVERS2  {V  }U).  There  are  then  two  possibilities: 

(a)  U  and  V  were  not  adjacent  before  deleting  R.  This  happens  when: 

(i)  V  =  S  and  U  is  one  of  the  XV  s  or  the  Zk  s.  Both  cases  are  the 
same,  so  let  us  assume  that  U  =  Z*.  From  step  1,  we  know  that 
-*COVERS1[S,R).  This  is  due  either  to  S  containing  an  isolated  sacred 
node,  in  which  case  we  immediately  get  -*COVERSP(Sf  Zk),  or  to  S 
containing  a  nonisolated  node  A  which  is  not  in  R.  Since  A  is  not  in  /?, 
A  remains  nonisolated,  and  since  R  is  on  a  path  between  S  and  Zk,  A 
cannot  be  in  Zk .  Therefore  ->COVERS2(S,  Zk). 

(ii)  U  —  S,  V  =  F.  Since  the  algorithm  is  top-down,  -^COVERS1  (F,R) 
must  already  hold.  If  this  is  due  to  F  containing  an  isolated  sacred 
node,  we  immediately  get  -*COVERS2(Ff  5).  Otherwise,  F  contains  a 
nonisolated  node  A  that  is  not  in  R.  Then  A  remains  nonisolated,  and 
since  R  is  on  a  path  between  F  and  5,  A  cannot  be  in  S.  This  shows 
that  ->COVERSr{Ff  S). 

(b)  U  and  V  were  adjacent  before  deleting  R ,  with  COVERS1  (V y  U)  before 

deleting  R  and  COVER!P(V9  U)  afterwards.  By  Lemma  1,  V  and  R  must 
be  adjacent  and  therefore  V  is  one  of  these:  (i)X,  (ii)  Zk  (iii)S'  (iv)F.  The  first 
three  imply  that  V  have  not  yet  been  processed.  In  the  fourth  case,  V  =  F 
and  U  is  a  child  of  F  other  than  R.  If  -^COVERS2(V ,  (7),  lemma  1  shows  that 
there  is  a  node  A  such  that  A  is  only  in  F  and  R .  But  COVERS1  (R,  5)  implies 
that  A  is  in  5,  a  contradiction.  | 


Proof  of  Theorem  3  By  the  two  previous  lemmas,  after  applying  the  algorithm  we  ob¬ 
tain  a  tree  T&\  with  the  property  that  if  U  is  a  child  of  V  then  ->COVERS(U,  V)  and 
->COVERS(V y  U).  Assume  that  there  arc  edges  U  and  V  such  that  COVERS(U ,  K).  I^et  W  be 
U' s  immediate  successor  on  a  path  connecting  U  and  V.  Then  one  of  the  following  occurs: 

(a)  W  is  a  child  of  U .  Take  any  A  in  U .  If  A  is  isolated,  COVERS(U,V)  implies  that 
A  must  be  nonsacred.  If  A  is  nonisolated,  then  A  is  in  V ,  which  because  the  tree 
represents  the  hypergraph  implies  that  A  must  be  in  W.  Therefore  COVERS{U y  W), 
contradicting  Lemma  4. 

(b)  U  is  a  child  of  W ♦  In  the  same  way,  we  get  COVERS(U ,  W),  a  contradiction.  | 


Reference* 


9 


§4  Complexity  of  the  Algorithm. 

Let  n  =  |G|  be  the  number  of  edges  in  the  hypergraph,  k  the  size  of  the  largest  edge 
in  G.  Both  steps  of  the  algorithm  compare  each  node  of  the  tree  with  its  parent  at  most  once 
(the  first  step  does  so  exactly  once).  Each  such  comparison  consists  of  two  parts: 

(a)  Test  if  the  COVERS  relation  holds.  This  requires  testing  if  each  attribute  in  the  edge 
is  isolated  or  not.  If  it  is  isolated,  the  COVERS  relation  does  not  hold  if  the  attribute 
is  sacred.  If  the  attribute  is  nonisolated,  we  then  have  to  test  if  the  attribute  is  in 
the  object  above  it  in  the  tree.  The  COVERS  relation  will  not  hold  if  it  is  not.  If 
the  objects  are  stored  as  lists  of  attributes  this  requires  time  proportional  to  k.  If  we 
maintain  a  count  of  the  number  of  edges  an  attribute  is  in,  test  for  isolation  requires 
constant  time,  and  therefore  testing  if  COVERS  holds  requires  0(k2). 

(b)  If  the  COVERS  relation  holds,  delete  a  node  from  the  tree.  If  we  represent  the 
tree  using  pointers  to  left  and  right  children  and  left  and  right  siblings,  the  deletion, 
requires  constant  time.  Updating  the  count  of  edges  for  each  attribute  requires  O(fc) 
time. 

The  two  steps  above  have  to  be  carried  out  at  most  twice  for  each  edge.  Therefore  the 
complexity  of  the  complete  algorithm  is  0(nk2). 

The  algorithm  in  [ASU]  for  simple  tableaux  uses  different  data  structures  and  so 
cannot  be  compared  directly  with  this.  It  requires  0(r4c)  where  r  is  the  number  or  rows  (=  n), 
and  c  the  number  of  columns  (>  k)  in  the  tableau.  If  we  apply  the  reduction  algorithm  using 
similar  data  structures,  i.e.,  represent  the  hypergraph  as  an  array  so  that  testing  if  an  attribute 
is  in  an  edge  requires  constant  time,  but  every  attribute  in  the  hypergraph  must  be  examined 
when  testing  if  COVERS  holds,  then  (a)  and  (b)  both  require  0(c)  time,  and  therefore  the 
algorithm  takes  0(rc)  time. 
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